Determining Native Language and Deception Using Phonetic Features and Classifier Combination
نویسندگان
چکیده
For several years, the Interspeech ComParE Challenge has focused on paralinguistic tasks of various kinds. In this paper we focus on the Native Language and the Deception subchallenges of ComParE 2016, where the goal is to identify the native language of the speaker, and to recognize deceptive speech. As both tasks can be treated as classification ones, we experiment with several state-of-the-art machine learning methods (Support-Vector Machines, AdaBoost.MH and Deep Neural Networks), and also test a simple-yet-robust combination method. Furthermore, we will assume that the native language of the speaker affects the pronunciation of specific phonemes in the language he is currently using. To exploit this, we extract phonetic features for the Native Language task. Moreover, for the Deception Sub-Challenge we compensate for the highly unbalanced class distribution by instance re-sampling. With these techniques we are able to significantly outperform the baseline SVM on the unpublished test set.
منابع مشابه
مقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملFeature Space Selection and Combination for Native Language Identification
We decribe the submissions made by the National Research Council Canada to the Native Language Identification (NLI) shared task. Our submissions rely on a Support Vector Machine classifier, various feature spaces using a variety of lexical, spelling, and syntactic features, and on a simple model combination strategy relying on a majority vote between classifiers. Somewhat surprisingly, a classi...
متن کاملNative Language Identification using Phonetic Algorithms
In this paper, we discuss the results of the IUCL system in the NLI Shared Task 2017. For our system, we explore a variety of phonetic algorithms to generate features for Native Language Identification. These features are contrasted with one of the most successful type of features in NLI, character n-grams. We find that although phonetic features do not perform as well as character n-grams alon...
متن کاملLanguage- and Talker-dependent Variation in Global Features of Native and Non-native Speech
We motivate and present a corpus of scripted and spontaneous speech in both the native and the non-native language of talkers from various language backgrounds. Using corpus recordings from 11 native English and 11 late Mandarin-English bilinguals we compared speech timing across native English, native Mandarin, and Mandarin-accented English. Findings showed similarities across native Mandarin ...
متن کاملIdentifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection
When automatically detecting deception, it is important to model individual differences across speakers. We explore the automatic identification of individual traits such as gender, native language, and personality, using acoustic-prosodic and lexical features from an initial non-deceptive dialogue. We also explore predicting success at deception and at deception detection, using the same featu...
متن کامل